Analysis for Car_Sales_Data¶
in this project , iwill work on car sales data which has some sales properties
Car_Sales_Data field Description¶
- Below is a description of column fields in the dataset:
Manufacturer: The brand or company that made the car (e.g., Ford, Toyota, VW, Porsche).
Model: The specific model name of the car (e.g., Fiesta, Golf, Prius).
Engine size: The size of the engine in liters (e.g., 1.6 = 1.6-liter engine).
Fuel type: The type of fuel the car
Year of manufacture: The year the car was built.
Mileage: The total distance the car has driven, measured in kilometers.
Price: The current price of the car
Question to be Answered depending an Analysis¶
What is the relationship between engine size and car price?
What is the relationship between car model and price?
What is the relationship between engine size and mileage?
What is the relationship between year of manufacture and car price?
What is the relationship between car model and engine size?
## load nedeed Modules
import pandas as pd
## display all data columns
pd.options.display.max_columns=None
## load the dataset into DataFrame
df=pd.read_csv(r"C:\Users\DR SYSTEM\Downloads\car_sales_data.csv")
## display first rows
df.head(2)
| Manufacturer | Model | Engine size | Fuel type | Year of manufacture | Mileage | Price | |
|---|---|---|---|---|---|---|---|
| 0 | Ford | Fiesta | 1.0 | Petrol | 2002 | 127300 | 3074 |
| 1 | Porsche | 718 Cayman | 4.0 | Petrol | 2016 | 57850 | 49704 |
##check for DataFrame shape
df.shape
(50000, 7)
- we found that the data has around 5k row with 7 column
## check for data info (quality)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50000 entries, 0 to 49999 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Manufacturer 50000 non-null object 1 Model 50000 non-null object 2 Engine size 50000 non-null float64 3 Fuel type 50000 non-null object 4 Year of manufacture 50000 non-null int64 5 Mileage 50000 non-null int64 6 Price 50000 non-null int64 dtypes: float64(1), int64(3), object(3) memory usage: 2.7+ MB
#list all data column
df.columns
Index(['Manufacturer', 'Model', 'Engine size', 'Fuel type',
'Year of manufacture', 'Mileage', 'Price'],
dtype='object')
Feature Engineering¶
- add High Mileage Flag coulmn
- add car age column
- add price per km column
#copy the dataframe
df_copy=df.copy()
#check for duplicates
df.duplicated().sum()
12
- there is 12 duplicated row
#drop duplicates
df.drop_duplicates(inplace=True)
#check
df.duplicated().sum()
0
#check for null values
df.isnull().sum()
Manufacturer 0 Model 0 Engine size 0 Fuel type 0 Year of manufacture 0 Mileage 0 Price 0 dtype: int64
- not null values
#check for data size
df.shape
(49988, 7)
#add coulmns High Mileage Flag
df['High Mileage Flag ']=df['Mileage'].apply(lambda x : "high" if x >1500 else "low")
#add car age columns
# load nedeed Modules
from datetime import datetime
current_year=datetime.now().year
df['car_age']=current_year-df['Year of manufacture']
print(df[['Model','Year of manufacture','car_age']].head())
Model Year of manufacture car_age 0 Fiesta 2002 23 1 718 Cayman 2016 9 2 Mondeo 2014 11 3 RAV4 1988 37 4 Polo 2006 19
#add price per km column
df['price per km']=df['Price'] / df['Mileage']
#check DataFrame
df.head(1)
| Manufacturer | Model | Engine size | Fuel type | Year of manufacture | Mileage | Price | High Mileage Flag | car_age | price per km | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Ford | Fiesta | 1.0 | Petrol | 2002 | 127300 | 3074 | high | 23 | 0.024148 |
df.describe()
| Engine size | Year of manufacture | Mileage | Price | car_age | price per km | |
|---|---|---|---|---|---|---|
| count | 49988.000000 | 49988.000000 | 49988.000000 | 49988.000000 | 49988.000000 | 49988.000000 |
| mean | 1.773140 | 2004.209630 | 112515.561215 | 13829.112387 | 20.790370 | 0.492719 |
| std | 0.734149 | 9.646056 | 71624.341062 | 16417.812203 | 9.646056 | 1.844404 |
| min | 1.000000 | 1984.000000 | 630.000000 | 76.000000 | 3.000000 | 0.000180 |
| 25% | 1.400000 | 1996.000000 | 54375.250000 | 3059.750000 | 13.000000 | 0.020147 |
| 50% | 1.600000 | 2004.000000 | 101011.500000 | 7971.000000 | 21.000000 | 0.081169 |
| 75% | 2.000000 | 2012.000000 | 158617.250000 | 19028.500000 | 29.000000 | 0.349427 |
| max | 5.000000 | 2022.000000 | 453537.000000 | 168081.000000 | 41.000000 | 113.993976 |
Q1:What is the relationship between engine size and car price?¶
#LOAD NEDEED MODULES
import plotly.express as px
px.scatter(df,x='Engine size',y='Price',trendline='ols')
As engine size increases, the car price tends to increase.
Q2:What is the relationship between car model and price?¶
#Load nedeed Modules
import seaborn as sns
import matplotlib.pyplot as plt
sns.scatterplot(data=df,x='Price',y='Model')
plt.title('Model vs Price')
plt.xlabel('Price')
plt.ylabel('Model')
plt.show()
Luxury and sports models like Porsche 911, M5, and Cayenne tend to have much higher prices compared to other models like Fiesta, Yaris, or Polo.
Q3:What is the relationship between engine size and mileage?¶
##load nedeed Modules
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
columns=['Engine size','Mileage']
correlation=df[columns].corr()
print(correlation)
sns.heatmap(correlation,annot=True)
plt.title('the relationship between engine size and mileage')
plt.show()
Engine size Mileage Engine size 1.000000 0.004365 Mileage 0.004365 1.000000
There is almost no correlation between engine size and mileage
Q4:What is the relationship between year of manufacture and car price?¶
#LOAD NEDEED MODULES
import plotly.express as px
px.scatter(df,x= 'Year of manufacture',y='Price',trendline='ols')
As engine size increases, the car price tends to increase.
Q5:What is the relationship between car model and engine size?¶
#LOAD NEDEED MODULES
import seaborn as sns
sns.barplot(data=df,x='Engine size' , y='Model')
plt.title('relationship between car model and engine size')
plt.xlabel('Engine size')
plt.ylabel('Model')
plt.show()
colclusion¶
- we found that the data has around 5k row with 7 column
- As engine size increases, the car price tends to increase
- Luxury and sports models like Porsche 911, M5, and Cayenne tend to have much higher prices compared to other models like Fiesta, Yaris, or Polo.
- There is almost no correlation between engine size and mileage
- As engine size increases, the car price tends to increase.